Goto

Collaborating Authors

 probabilistic value




One Sample Fits All: Approximating All Probabilistic Values Simultaneously and Efficiently

Li, Weida, Yu, Yaoliang

arXiv.org Artificial Intelligence

The concept of probabilistic values, such as Beta Shapley values and weighted Banzhaf values, has gained recent attention in applications like feature attribution and data valuation. However, exact computation of these values is often exponentially expensive, necessitating approximation techniques. Prior research has shown that the choice of probabilistic values significantly impacts downstream performance, with no universally superior option. Consequently, one may have to approximate multiple candidates and select the best-performing one. Although there have been many efforts to develop efficient estimators, none are intended to approximate all probabilistic values both simultaneously and efficiently. In this work, we embark on the first exploration of achieving this goal. Adhering to the principle of maximum sample reuse, we propose a one-sample-fits-all framework parameterized by a sampling vector to approximate intermediate terms that can be converted to any probabilistic value without amplifying scalars. Leveraging the concept of $ (\epsilon, \delta) $-approximation, we theoretically identify a key formula that effectively determines the convergence rate of our framework. By optimizing the sampling vector using this formula, we obtain i) a one-for-all estimator that achieves the currently best time complexity for all probabilistic values on average, and ii) a faster generic estimator with the sampling vector optimally tuned for each probabilistic value. Particularly, our one-for-all estimator achieves the fastest convergence rate on Beta Shapley values, including the well-known Shapley value, both theoretically and empirically. Finally, we establish a connection between probabilistic values and the least square regression used in (regularized) datamodels, showing that our one-for-all estimator can solve a family of datamodels simultaneously.


The Application of Affective Measures in Text-based Emotion Aware Recommender Systems

Leung, John Kalung, Griva, Igor, Kennedy, William G., Kinser, Jason M., Park, Sohyun, Lee, Seo Young

arXiv.org Artificial Intelligence

This paper presents an innovative approach to address the problems researchers face in Emotion Aware Recommender Systems (EARS): the difficulty and cumbersome collecting voluminously good quality emotion-tagged datasets and an effective way to protect users' emotional data privacy. Without enough good-quality emotion-tagged datasets, researchers cannot conduct repeatable affective computing research in EARS that generates personalized recommendations based on users' emotional preferences. Similarly, if we fail to fully protect users' emotional data privacy, users could resist engaging with EARS services. This paper introduced a method that detects affective features in subjective passages using the Generative Pre-trained Transformer Technology, forming the basis of the Affective Index and Affective Index Indicator (AII). Eliminate the need for users to build an affective feature detection mechanism. The paper advocates for a separation of responsibility approach where users protect their emotional profile data while EARS service providers refrain from retaining or storing it. Service providers can update users' Affective Indices in memory without saving their privacy data, providing Affective Aware recommendations without compromising user privacy. This paper offers a solution to the subjectivity and variability of emotions, data privacy concerns, and evaluation metrics and benchmarks, paving the way for future EARS research.


Logistic Regression -- An Overview with an Example

#artificialintelligence

Known for its simplicity to understand, the Logistic Regression algorithm is very reliable and extremely useful, and that's why when it comes to binary classification problems, The Logistic Regression is any engineers go-to choice. The Logistic Regression uses the sigmoid function to output continuous probabilistic values between 0–1 for any value from its independent variables, and these probabilistic values are then compared against a threshold value of 0.5. Any value greater than 0.5 is classified in the "1 category class," and any value less than 0.5 is classified in the "0 category class" or the "class in which the particular event does not take place." A common question people ask is that, If Logistic Regression is used for classification problems, why does it have the "Regression" term in it? And why can't we use Linear Regression instead of Logistic Regression for classification problems? The answer to the first question is that Even though, The Logistic Regression is used for binary classification problems, The output from the sigmoid equation is still a continuous numerical value.